Conversation
7b635d9 to
b652d70
Compare
There was a problem hiding this comment.
Pull request overview
Adds a new remote seed dataset loader for the BeaverTails HuggingFace dataset, making it discoverable via SeedDatasetProvider and documenting its availability.
Changes:
- Introduces
_BeaverTailsDatasetremote loader with optionalunsafe_onlyfiltering (default: unsafe only). - Registers the loader in the remote datasets module and adds unit tests for filtering behavior.
- Updates the “Loading Built-in Datasets” notebook output to include the new dataset name.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
pyrit/datasets/seed_datasets/remote/beaver_tails_dataset.py |
New HuggingFace-backed loader that converts BeaverTails rows into SeedPrompts (unsafe-only by default). |
pyrit/datasets/seed_datasets/remote/__init__.py |
Imports/exports the new loader so it’s auto-registered/discoverable. |
tests/unit/datasets/test_beaver_tails_dataset.py |
Adds unit tests covering unsafe-only vs all-entries behavior and dataset naming. |
doc/code/datasets/1_loading_datasets.ipynb |
Notebook updated to reflect the new dataset in the available list (but now includes executed outputs/metadata). |
9741ae3 to
1fd2ef7
Compare
Add remote dataset loader for BeaverTails (PKU-Alignment/BeaverTails), containing 330k+ QA pairs annotated across 14 harm categories for safety alignment research. Filters to unsafe entries by default. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The HF dataset identifier is now a class constant HF_DATASET_NAME instead of a constructor parameter, consistent with other loaders. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
For a 330k-row dataset, this avoids hundreds of thousands of redundant string/list allocations. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
8a9dccb to
a91052f
Compare
| description = ( | ||
| "BeaverTails is a collection of 330k+ human-LLM QA pairs annotated across 14 harm " | ||
| "categories, designed for safety alignment research. Introduced in 'BeaverTails: " | ||
| "Towards Improved Safety Alignment of LLM via a Human-Preference Dataset' (2023)." | ||
| ) | ||
|
|
||
| source_url = f"https://huggingface.co/datasets/{self.HF_DATASET_NAME}" | ||
| groups = ["Institute for Artificial Intelligence", "CFCS, School of Computer Science"] | ||
|
|
||
| seed_prompts = [] | ||
| for item in data: | ||
| if self.unsafe_only and item["is_safe"]: | ||
| continue | ||
|
|
||
| harm_categories = [k for k, v in item["category"].items() if v] | ||
|
|
||
| seed_prompts.append( | ||
| SeedPrompt( | ||
| value=f"{{% raw %}}{item['prompt']}{{% endraw %}}", | ||
| data_type="text", | ||
| dataset_name=self.dataset_name, | ||
| harm_categories=harm_categories, | ||
| description=description, | ||
| source=source_url, | ||
| authors=authors, | ||
| groups=groups, |
There was a problem hiding this comment.
The description/docstring emphasizes that BeaverTails contains QA pairs, but the loader currently only emits SeedPrompt values from item['prompt'] and ignores the associated response. To avoid misleading consumers, either (a) explicitly document that only the prompt column is extracted (similar to other dataset loaders), or (b) include the response in SeedPrompt.metadata (or a paired seed type if supported) so the QA relationship isn’t lost.
| { | ||
| "name": "stderr", | ||
| "output_type": "stream", | ||
| "text": [ | ||
| "C:\\Users\\romanlutz\\AppData\\Local\\Temp\\ipykernel_50620\\4021500943.py:10: DeprecationWarning: is_objective parameter is deprecated since 0.13.0. Use seed_type='objective' instead.\n", | ||
| "C:\\Users\\romanlutz\\AppData\\Local\\Temp\\ipykernel_50556\\4021500943.py:10: DeprecationWarning: is_objective parameter is deprecated since 0.13.0. Use seed_type='objective' instead.\n", | ||
| " memory.get_seeds(harm_categories=[\"illegal\"], is_objective=True)\n" | ||
| ] |
There was a problem hiding this comment.
This notebook diff still includes captured runtime output with user/machine-specific absolute paths (e.g., C:\\Users\\...\\AppData\\Local\\Temp\\ipykernel_...). Please clear cell outputs (and any execution metadata) before committing so docs remain deterministic and don’t leak local environment details.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add remote dataset loader for BeaverTails (PKU-Alignment/BeaverTails), containing 330k+ QA pairs annotated across 14 harm categories for safety alignment research. Filters to unsafe entries by default.